由于从大规模预先训练的语言模型的转移学习在自然语言处理中普遍存在,在计算受限环境中运行这些模型仍然是一个具有挑战性的问题。已经提出了包括知识蒸馏,网络量化或网络修剪的几种解决方案;然而,这些方法主要关注英语,从而在考虑低资源语言时扩大差距。在这项工作中,我们为罗马尼亚语推出了三种轻型和快速版本的罗马尼亚语言:Distil-Bert-Base-Ro,Distil-Robert-Base和DistilMulti-Bert-Bas-Ro。前两种模型因单独蒸馏在文献中提供的两个基础版本的罗马尼亚伯爵的知识,而最后一个是通过蒸馏它们的集合来获得的。为了我们的知识,这是第一次尝试创建公开可用的罗马尼亚蒸馏BERT模型,这是在五个任务上进行彻底评估的:语音标记,名为实体识别,情感分析,语义文本相似性和方言识别。这些基准测试的实验结果证明,我们的三种蒸馏模型在与老师的准确性方面保持最大的表现,而GPU的两倍于GPU和〜35 \%较小。此外,我们进一步测试了我们的学生和他们的老师之间的相似性,通过测量其标签和概率忠诚度以及回归忠诚度 - 在这项工作中引入的新指标。
translated by 谷歌翻译
接受社会辅助机器人的基本功能之一是其与环境中其他代理商的通信能力。在Robin项目的背景下,调查了通过与机器人的语音互动的情境对话。本文介绍了具有深度神经网络的不同语音识别实验,专注于生产快速(从网络本身的100ms延迟下),而仍然可靠的型号。即使关键所需特性之一是低延迟,最终的深度神经网络模型也能实现识别罗马尼亚语的最新状态,以获得9.91%的字错误率(WER),当与语言模型相结合,从而改善以前的结果同时提供了改进的运行时性能。此外,我们探索了两个模块,用于校正ASR输出(连字符和大写恢复和未知单词校正),针对Robin项目的目标(在封闭的微观世界中对话)。我们根据API设计模块化架构,允许整合引擎(机器人或外部)根据需要将可用模块链接在一起。最后,我们通过将其集成在相关平台中并通过上传文件或录制新的语音来测试所提出的设计。
translated by 谷歌翻译
自动学习的单词矢量表示,也称为“Word Embeddings”,正在成为越来越多的自然语言处理算法的基本构建块。有不同的方式和工具来构建Word Embeddings。大多数方法依赖于原始文本,施工项目是单词出现和/或字母n-grams。更详细的研究正在使用文本预处理后提取的额外语言特征。通过原始文本和字母n-gram构建的矢量表示,形态学明显地提供了形态。语法和语义研究可以从与每个单词相关联的诸如引理,语音,语法或语义依赖的其他特征(如引物)构建的矢量表示,更多。 Reterom项目的一个主要目标之一是开发罗马尼亚自然语言处理的先进技术,包括文本的形态,句法和语义分析。因此,我们计划开发开放式大型库的即用的Word Embeddings集合,每个设置的特征在于不同的参数:使用的特征(Wordforms,字母N-gram,Lemmas,Pose等),矢量长度,窗口/上下文大小和频率阈值。为此,先前创建了Corola语料库上的Word Embeddings集(基于Word Imperience)(p \ u {a} i \ c {s}和tufi \ c {s},2018)是进一步增强的通过使用lemmas和言论之类的特定功能,从相同的语料库中学到了新的陈述。此外,为了更好地理解和探索向量,图形表示将通过自定义接口提供。
translated by 谷歌翻译
确保适当的标点符号和字母外壳是朝向应用复杂的自然语言处理算法的关键预处理步骤。这对于缺少标点符号和壳体的文本源,例如自动语音识别系统的原始输出。此外,简短的短信和微博的平台提供不可靠且经常错误的标点符号和套管。本调查概述了历史和最先进的技术,用于恢复标点符号和纠正单词套管。此外,突出了当前的挑战和研究方向。
translated by 谷歌翻译
Reading comprehension of legal text can be a particularly challenging task due to the length and complexity of legal clauses and a shortage of expert-annotated datasets. To address this challenge, we introduce the Merger Agreement Understanding Dataset (MAUD), an expert-annotated reading comprehension dataset based on the American Bar Association's 2021 Public Target Deal Points Study, with over 39,000 examples and over 47,000 total annotations. Our fine-tuned Transformer baselines show promising results, with models performing well above random on most questions. However, on a large subset of questions, there is still room for significant improvement. As the only expert-annotated merger agreement dataset, MAUD is valuable as a benchmark for both the legal profession and the NLP community.
translated by 谷歌翻译
We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. When executing SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, we can reach 60% sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches.
translated by 谷歌翻译
Despite the success of large language models (LLMs) in various natural language processing (NLP) tasks, the stored knowledge in these models may inevitably be incomplete, out-of-date, or incorrect. This motivates the need to utilize external knowledge to assist LLMs. Unfortunately, current methods for incorporating external knowledge often require additional training or fine-tuning, which can be costly and may not be feasible for LLMs. To address this issue, we propose a novel post-processing approach, rethinking with retrieval (RR), which retrieves relevant external knowledge based on the decomposed reasoning steps obtained from the chain-of-thought (CoT) prompting. This lightweight approach does not require additional training or fine-tuning and is not limited by the input length of LLMs. We evaluate the effectiveness of RR through extensive experiments with GPT-3 on three complex reasoning tasks: commonsense reasoning, temporal reasoning, and tabular reasoning. Our results show that RR can produce more faithful explanations and improve the performance of LLMs.
translated by 谷歌翻译
Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32$\times$, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels ($\sim$30$\times$, $\sim$40$\times$), our method significantly improves the test accuracy and reduces the model size.
translated by 谷歌翻译
Most existing pruning works are resource-intensive, requiring retraining or fine-tuning of the pruned models for accuracy. We propose a retraining-free pruning method based on hyperspherical learning and loss penalty terms. The proposed loss penalty term pushes some of the model weights far from zero, while the rest weight values are pushed near zero and can be safely pruned with no need for retraining and a negligible accuracy drop. In addition, our proposed method can instantly recover the accuracy of a pruned model by replacing the pruned values with their mean value. Our method obtains state-of-the-art results in retraining-free pruning and is evaluated on ResNet-18/50 and MobileNetV2 with ImageNet dataset. One can easily get a 50\% pruned ResNet18 model with a 0.47\% accuracy drop. With fine-tuning, the experiment results show that our method can significantly boost the accuracy of the pruned models compared with existing works. For example, the accuracy of a 70\% pruned (except the first convolutional layer) MobileNetV2 model only drops 3.5\%, much less than the 7\% $\sim$ 10\% accuracy drop with conventional methods.
translated by 谷歌翻译
Most of the existing works use projection functions for ternary quantization in discrete space. Scaling factors and thresholds are used in some cases to improve the model accuracy. However, the gradients used for optimization are inaccurate and result in a notable accuracy gap between the full precision and ternary models. To get more accurate gradients, some works gradually increase the discrete portion of the full precision weights in the forward propagation pass, e.g., using temperature-based Sigmoid function. Instead of directly performing ternary quantization in discrete space, we push full precision weights close to ternary ones through regularization term prior to ternary quantization. In addition, inspired by the temperature-based method, we introduce a re-scaling factor to obtain more accurate gradients by simulating the derivatives of Sigmoid function. The experimental results show that our method can significantly improve the accuracy of ternary quantization in both image classification and object detection tasks.
translated by 谷歌翻译